Introduction

The Academy Awards, also known as the Oscars, is arguably the ultimate award for any movie. I plan to use the data I have collected to investigate the relationship between Oscars and money. Do Oscar winning movies cost more to make than movies that lose Oscars? Do both of these sets of movies cost more than movies that are not nominated? Do Oscar winners, Oscar loser, or non-nominated movies make more gross revenue? Which of these has the highest percentage profit? I will use the data to answer these questions and more. Along the way, I will also try to highlight some interesting statistics and facts. By analyzing this data, I hope to shed some light on whether production companies pay for Oscars, profit by making Oscars, both, or neither.

Also, by choosing a fun topic I hope that people looking find answers that they enjoy thinking about. It is certainly true that I had fun looking for my own answers.


Data Preparation

Most of these 8 libraries are used extensively, while a couple have only one or two functions that came in handy for things I wanted to accomplish in the presentation.

library(tidyverse) # multiple tidy packages
library(readr) # for reading files
library(ggplot2) # for plotting
library(plotly) # for interactive plotting
library(dplyr) # data manipulation
library(flextable) # pretty tables
library(formattable) # comma function formatting numbers
library(gridExtra) # side by side plots
library(kableExtra) # for scrolling table

Here I read in the four data sets I wound up using. I give a brief explanation of each below.

budget <- read_csv('movies_budget.csv') # budget including new movies
oscars <- read_csv('the_oscar_award.csv') # oscar winners and losers

meta <- read_csv('movies_metadata.csv') # majority of data used

inflation <- read_csv('inflation_data.csv') 
# inflation data for calculations

Paying for Oscars or Oscars for Pay?

  • The largest part of my data comes from a data set I got from github. yash91sharma github created the data for a project similar to mine in which they asked which countries made the most movies.This data set contains 45,466 rows of data and seems to have been made around 2017.
  • I used alexsychu kaggle to gather some more data up to 2020. This second dat set had an additional 7,668 rows of data. This user was trying to answer the question of how to predict if a movie will do well.

  • Lastly for movie data, I used data from unanimad kaggle . This data set had 10,395 rows of data and this kaggle user asked various questions about who won Oscars.

  • My last and only ‘non-movie’ data set comes from officialdata.org , where I obtained 223 rows of inflation data which allowed me to calculate budgets and gross revenue with inflation as a factor.


Data Wrangling

There were many renames and some work to choose a movie budget in some cases. Once I had the columns I wanted, I set in on joining the data together. First, I joined the meta data with the budget data to include all of the titles for which I had information. My next adventure (or misadventure) was to join this data with the inflation data. Once that was done, all that was left was joining this information with the Oscar data. All in all, I used three full joins.

oscars <- oscars %>%    # rename year for join
  rename('year' = 'year_film')

meta$year <-  format(as.Date(meta$release_date, format = "%m/%d/%Y"), "%Y")

blue = '#000080' # just the color code I wanted to use for my print
  
meta <- meta %>% 
  mutate(year = as.numeric(year)) # easier use for comparisons/inflation

# renames
budget <- budget %>% 
  rename(vote_average = score) %>% # renaming vote data for joins etc
  rename(vote_count = votes) %>% 
  rename(Title = 'Movie Title')


oscars <- oscars %>% 
  rename(Title = 'film')  # rename for joins

adj <- inflation %>% 
  mutate(multiplier = (22.82/amount)) 
# create multiplier column for easy calculations

budget <- budget %>% 
  rename(new_budget = Budget)  # rename for joins etc


options(scipen = 100) # avoid scientific notation 

full_budget <- full_join(meta, budget, on = 'Title') 
# full join of metadata and budget data to get new and old movies etc

full_bud <- full_budget %>% 
  mutate(budget = pmax(new_budget, meta_budget, na.rm = T)) %>% 
  select(Title,genres, budget,new_budget,
         meta_budget, popularity,year,
         release_date, revenue, runtime, vote_average,
         vote_count,gross)
# set budget to max of two different budgets.
# picking max is arbitrary, but needed in most cases


full_bud <- full_bud[-c(1,2,3),] %>%
   arrange(desc(as.numeric(budget)))
# remove first three unnecessary rows


full_bud <- full_bud %>%  # replace 1900 sentinels with 2022
  mutate(year = replace(year, year == 1900, 2022))
# Most of these seemed to be less known movies
# so I thought that 2022 would do the least harm 
# with the inflation numbers

adj_bud <- full_join(full_bud, adj,
                             on = c('Title', 'year')) %>% 
  mutate(with_inflation = (as.numeric(budget) * (multiplier))) %>% 
  mutate(gross_inflation = (as.numeric(full_bud$gross)
                            * (multiplier))) %>% 
  select(Title,genres,vote_average, vote_count, budget,
         with_inflation, gross_inflation, year, gross,
         release_date) %>% 
  arrange(desc(with_inflation))  # joining movies with inflation
# creating with_inflation(budget) and gross_inflation columns
# arranging by highest with_inflation budgets


osc_bud <-  full_join(adj_bud, oscars, on  = c('Title'))
# joining all other data to Oscars data
# This is the starting point for most of my data manipulation
# 62706 entries, 15 total columns, of which 8 columns used.

I wound up only using 8 of the 15 columns of my final data set. A few of these columns were created by manipulating other data. I had planned to look at more aspects, but I eventually realized that was too much for this presentation. Maybe someone else will see what I have done, and decide to dig deeper. Maybe I will come back to it someday myself.


Some Dataset Numbers

oscar_titles <- oscars %>% # count number of titles in oscar data
  distinct(Title)

oscar_wn_titles <- oscars %>% # count number of oscar winning movies
  filter(winner == T) %>% 
  distinct(Title)

total_noms <- osc_bud %>% # count total number of nominations
  filter(!is.na(winner))


total_wins <- osc_bud %>% # count total number of awards won
  filter(winner == T)


bst_pic <-  osc_bud %>% 
  group_by(Title) %>% 
  filter((str_detect(category, 'PIC') | 
            str_detect(category, 'OUT'))) %>% 
  select(Title, year,budget, with_inflation,
         gross_inflation, gross, genres, vote_count,
         vote_average, category, winner) %>% 
  distinct(Title, .keep_all = T)
# select best picture nominees in its many forms


bst_pic_won <- bst_pic %>%  # count best picture winners
  filter(winner == T) %>% 
  distinct(Title, .keep_all = T)

bst_act <- osc_bud %>%  # count best actor/actress titles
  group_by(Title) %>% 
  filter(str_detect(category, 'ACT')) %>% 
  select(Title, genres, budget, category, winner)

bst_act_wn <- bst_act %>% 
  filter(!is.na(winner)) %>% 
  filter(winner == T) %>% 
  distinct(Title, .keep_all = T) # count winners of actor/actress

The code above is used to find the numbers contained here. The data I was able to collect contains 4,834 movies nominated for an Academy Award since its inception in 1929. Of these 4,834 movies. 1,274 won at least one award. There have been 13,312 total Oscar nominations, and 3,001 total Oscar wins in the dataset. 559 movies have been nominated for Best Picture in its many forms. Out of these, 92 won. 1,154 movies have had an actor or actress nominated in either a leading or supporting role. 313 movies had at least one winner in an acting category.


Plotting Profit

The first graph is a scatter plot of budget vs gross. Because there were so many, I also wanted to make the scatter plot with non-nominated movies removed. The last plot shows the same information with inflation. The black lines are the break even line.

osc_bud <- osc_bud[!is.na(osc_bud$budget), ] # remove na budgets


osc_bud <- osc_bud %>% 
  mutate(budget = as.numeric(budget)) # make budget numeric 

my_theme = theme(axis.text.x = element_text(angle = -90,
                                   size = 5, color = blue ),
        axis.text.y = element_text(color = blue)) +
  theme(axis.title.x = element_text(color = blue),
        axis.title.y = element_text(color = blue))

p <- osc_bud %>% select(Title, genres, gross, budget, winner) %>% 
  ggplot(aes(x = budget, y = gross, color = winner, label = Title), alpha = 0.1) +
  geom_abline(intercept = 0, slope = 1) +
  geom_point() +
  my_theme  # budget vs gross

ggplotly(p)
p2 <- osc_bud %>% select(Title, genres, gross, budget, winner) %>% 
  filter(!is.na(winner)) %>% 
  ggplot(aes(x = budget, y = gross, color = winner, label = Title), alpha = 0.1) +
  geom_abline(intercept = 0, slope = 1) +
  geom_point() +
  my_theme # non-nominated removed

ggplotly(p2)
p3 <- osc_bud %>% select(Title, genres, gross, budget,with_inflation, gross_inflation, winner) %>% 
  filter(!is.na(winner)) %>% 
  ggplot(aes(x = with_inflation, y = gross_inflation, color = winner, label = Title), alpha = 0.1) +
  geom_abline(intercept = 0, slope = 1) +
  geom_point() +
  my_theme

ggplotly(p3) # with inflation

The scatter plot above plots budget vs gross. Oscar winners are in blue, Oscar losers in red and non-nominated in grey. These colors will mean the same thing throughout this report.


Highest Percent Profit

high_prof <- osc_bud %>% 
  filter(budget != 0) %>% 
  mutate(pct_prof = 100 * (as.numeric(gross)/as.numeric(budget))) %>% 
  select(Title, budget, gross, pct_prof) %>% 
  arrange(desc(pct_prof)) %>% 
  distinct(Title, .keep_all = T)

my_table <-  kable(head(high_prof, 100),
      format.args = list(big.mark = ",")) %>%
  kable_styling(
    font_size = 15,
    bootstrap_options = c("striped", "hover", "condensed")
  ) 

 
scroll_box(my_table, height = '500px', width = '500px', 
           box_css = "border: 1px solid #ddd; padding: 5px; ",
           extra_css = "color: navy;",
           fixed_thead = TRUE
)
Title budget gross pct_prof
Paranormal Activity 15,000 193,355,800 1,289,038.667
The Blair Witch Project 60,000 248,639,099 414,398.498
The Gallows 100,000 42,964,410 42,964.410
El Mariachi 7,000 2,040,920 29,156.000
Once 150,000 20,936,722 13,957.815
Clerks 27,000 3,151,130 11,670.852
Napoleon Dynamite 400,000 46,138,887 11,534.722
In the Company of Men 25,000 2,804,473 11,217.892
Keeping Mum 169,000 18,586,834 10,998.127
Open Water 500,000 54,683,487 10,936.697
The Devil Inside 1,000,000 101,758,490 10,175.849
The Quiet Ones 200,000 17,835,162 8,917.581
Saw 1,200,000 103,911,669 8,659.306
Searching 880,000 75,462,037 8,575.231
Primer 7,000 545,436 7,791.943
E.T. the Extra-Terrestrial 10,500,000 792,910,554 7,551.529
My Big Fat Greek Wedding 5,000,000 368,744,044 7,374.881
The Full Monty 3,500,000 257,938,649 7,369.676
Friday the 13th 550,000 39,754,601 7,228.109
Fireproof 500,000 33,473,297 6,694.659
Insidious 1,500,000 99,557,032 6,637.135
Unfriended 1,000,000 62,882,090 6,288.209
Paranormal Activity 2 3,000,000 177,512,032 5,917.068
Get Out 4,500,000 255,589,157 5,679.759
Four Weddings and a Funeral 4,400,000 245,700,832 5,584.110
Pi 60,000 3,221,152 5,368.587
Slacker 23,000 1,228,108 5,339.600
Hollywood Shuffle 100,000 5,228,617 5,228.617
The Breakfast Club 1,000,000 51,525,171 5,152.517
Taxi 3 1,300,000 65,497,208 5,038.247
Valley Girl 350,000 17,343,596 4,955.313
Chasing Amy 250,000 12,021,272 4,808.509
Clifford’s Really Big Movie 70,000 3,255,426 4,650.609
A Separation 500,000 22,926,076 4,585.215
Porky’s 2,500,000 111,289,673 4,451.587
The Brothers McMullen 238,000 10,426,506 4,380.885
She’s Gotta Have It 175,000 7,137,502 4,078.573
Annabelle 6,500,000 257,579,282 3,962.758
Look Who’s Talking 7,500,000 296,999,813 3,959.998
The Lives of Others 2,000,000 77,356,942 3,867.847
The Last Exorcism 1,800,000 69,432,527 3,857.363
Chernobyl Diaries 1,000,000 38,390,020 3,839.002
Crocodile Dundee 8,800,000 328,203,506 3,729.585
Saw II 4,000,000 147,748,505 3,693.713
Dirty Dancing 6,000,000 214,577,242 3,576.287
Pretty Woman 14,000,000 463,406,268 3,310.045
Cry Wolf 1,000,000 32,586,408 3,258.641
Insidious: Chapter 2 5,000,000 161,919,318 3,238.386
God’s Not Dead 2,000,000 64,676,349 3,233.817
Juno 7,500,000 232,372,681 3,098.302
Split 9,000,000 278,454,417 3,093.938
Wolf Creek 1,000,000 30,894,796 3,089.480
The Living End 22,769 692,585 3,041.789
Lights Out 4,900,000 148,868,835 3,038.139
Star Wars: Episode V - The Empire Strikes Back 18,000,000 538,375,067 2,990.973
The Purge 3,000,000 89,328,627 2,977.621
Lost in Translation 4,000,000 118,686,937 2,967.173
Paranormal Activity 4 5,000,000 142,802,657 2,856.053
The King’s Speech 15,000,000 427,374,317 2,849.162
Sinister 3,000,000 82,515,113 2,750.504
Truth or Dare 3,500,000 95,330,710 2,723.735
Stranger Than Paradise 90,000 2,436,000 2,706.667
You’re Next 1,000,000 26,895,481 2,689.548
Pulp Fiction 8,000,000 213,928,762 2,674.110
Home Alone 18,000,000 476,684,675 2,648.248
In the Bedroom 1,700,000 44,763,181 2,633.128
Happy Death Day 4,800,000 125,479,266 2,614.151
Deathstalker 457,000 11,919,250 2,608.151
Your Sister’s Sister 125,000 3,242,802 2,594.242
The Fault in Our Stars 12,000,000 307,166,834 2,559.724
Halloween 10,000,000 255,614,941 2,556.149
Black Swan 13,000,000 329,398,046 2,533.831
Slumdog Millionaire 15,000,000 378,410,542 2,522.737
The House on Sorority Row 425,000 10,604,986 2,495.291
Sling Blade 1,000,000 24,444,121 2,444.412
War Room 3,000,000 73,256,266 2,441.876
The Lion King 45,000,000 1,083,720,877 2,408.269
A Haunted House 2,500,000 60,159,584 2,406.383
Magic Mike 7,000,000 167,739,961 2,396.285
Airplane! 3,500,000 83,453,539 2,384.387
Top Gun 15,000,000 357,288,178 2,381.921
American Beauty 15,000,000 356,296,601 2,375.311
Flashdance 4,000,000 92,921,203 2,323.030
Platoon 6,000,000 138,545,632 2,309.094
High School Musical 3: Senior Year 11,000,000 252,909,177 2,299.174
Ghost 22,000,000 505,703,557 2,298.653
Fatal Attraction 14,000,000 320,145,693 2,286.755
Swingers 200,000 4,555,020 2,277.510
Parasite 11,400,000 258,908,054 2,271.123
Beverly Hills Cop 14,000,000 316,360,478 2,259.718
Good Will Hunting 10,000,000 225,933,435 2,259.334
It Follows 1,000,000 21,947,454 2,194.745
Billy Elliot 5,000,000 109,283,018 2,185.660
Indiana Jones and the Raiders of the Lost Ark 18,000,000 389,925,971 2,166.255
The Peanut Butter Falcon 6,200,000 133,031,473 2,145.669
The Fog 1,000,000 21,448,782 2,144.878
American Pie 11,000,000 235,483,004 2,140.755
Ouija 5,000,000 103,687,316 2,073.746
Sex, Lies, and Videotape 1,200,000 24,741,667 2,061.806
A Quiet Place 17,000,000 350,320,413 2,060.708

Lowest Percent Profit

low_prof <- high_prof %>% 
  na.omit(budget) %>% 
  arrange(pct_prof) %>% 
  distinct(Title, .keep_all = T)

my_table2 <- kable(head(low_prof, 100),
      format.args = list(big.mark = ",")) %>%
  kable_styling(
    font_size = 15,
    bootstrap_options = c("striped", "hover", "condensed")
  ) 


scroll_box(my_table2, height = '500px', width = '500px', 
           box_css = "border: 1px solid #ddd; padding: 5px; ",
           extra_css = "color: navy;",
           fixed_thead = TRUE
)
Title budget gross pct_prof
Trojan War 15,000,000 309 0.0020600
Madadayo 11,900,000 596 0.0050084
Ginger Snaps 5,000,000 2,554 0.0510800
Philadelphia Experiment II 5,000,000 2,970 0.0594000
The Lovers on the Bridge 28,000,000 29,679 0.1059964
Savior 10,000,000 14,328 0.1432800
Tanner Hall 3,000,000 5,073 0.1691000
Crimewave 3,000,000 5,101 0.1700333
Hell’s Kitchen 6,000,000 11,710 0.1951667
Barefoot 6,000,000 15,071 0.2511833
Freaked 11,000,000 29,296 0.2663273
Passion Play 8,000,000 25,603 0.3200375
About Cherry 2,500,000 8,315 0.3326000
Rock & Rule 8,000,000 30,379 0.3797375
Best Laid Plans 7,000,000 27,816 0.3973714
Brenda Starr 16,000,000 67,878 0.4242375
O.C. and Stiggs 7,000,000 29,815 0.4259286
My Summer Story 15,000,000 70,936 0.4729067
The Boondock Saints 6,000,000 30,471 0.5078500
Vamps 16,000,000 92,748 0.5796750
Love Ranch 25,000,000 146,149 0.5845960
Arizona Dream 19,000,000 112,547 0.5923526
The Irishman 159,000,000 968,853 0.6093415
Smooth Talk 2,400,000 16,785 0.6993750
Dominion 30,000,000 251,495 0.8383167
Surfer, Dude 6,000,000 52,132 0.8688667
Postal 15,000,000 146,741 0.9782733
Crackers 12,000,000 129,268 1.0772333
Bloodhounds of Broadway 4,000,000 43,671 1.0917750
The Last Time I Committed Suicide 4,000,000 46,362 1.1590500
Phobia 5,100,000 59,167 1.1601373
There Goes My Baby 10,500,000 123,509 1.1762762
Gentlemen Broncos 10,000,000 118,492 1.1849200
Animal Factory 3,600,000 43,805 1.2168056
Underground 14,000,000 171,082 1.2220143
Revolution 28,000,000 358,574 1.2806214
Fandango 7,000,000 91,666 1.3095143
The Million Dollar Hotel 8,000,000 105,983 1.3247875
Five Days One Summer 15,000,000 199,078 1.3271867
The Specials 1,000,000 13,276 1.3276000
Lawn Dogs 8,000,000 106,404 1.3300500
Committed 3,000,000 40,361 1.3453667
Mad Dog Time 8,000,000 107,874 1.3484250
Eulogy 6,500,000 89,781 1.3812462
Onegin 14,000,000 206,128 1.4723429
Breakfast of Champions 12,000,000 178,278 1.4856500
Tigerland 10,000,000 148,701 1.4870100
Canadian Bacon 11,000,000 163,971 1.4906455
Beasts of No Nation 6,000,000 90,777 1.5129500
Rapa Nui 20,000,000 305,070 1.5253500
Ride with the Devil 38,000,000 635,096 1.6713053
Lolita 62,000,000 1,071,255 1.7278306
Marriage Story 18,600,000 333,686 1.7940108
Liebestraum 6,900,000 133,645 1.9368841
Illegally Yours 13,000,000 259,019 1.9924538
The Weight of Water 16,000,000 321,279 2.0079937
Texas Rangers 38,000,000 763,740 2.0098421
The Beast of War 8,000,000 161,004 2.0125500
Clinton Road 2,500,000 50,400 2.0160000
Ophelia 12,000,000 242,115 2.0176250
The Tempest 20,000,000 405,861 2.0293050
Police Academy: Mission to Moscow 6,200,000 126,247 2.0362419
True Colors 20,000,000 418,807 2.0940350
The Informers 18,000,000 382,174 2.1231889
Pontypool 1,500,000 32,118 2.1412000
Curdled 2,300,000 49,620 2.1573913
Southland Tales 17,000,000 374,743 2.2043706
Repo! The Genetic Opera 8,500,000 188,126 2.2132471
All I See Is You 30,000,000 678,150 2.2605000
Love’s Labour’s Lost 13,000,000 299,792 2.3060923
The Big Picture 5,000,000 117,463 2.3492600
Honeymoon 1,000,000 24,343 2.4343000
One from the Heart 26,000,000 636,796 2.4492154
The Isle 1,000,000 24,963 2.4963000
Ruby Cairo 24,000,000 608,866 2.5369417
After Midnight 3,000,000 76,325 2.5441667
Mini’s First Time 6,000,000 156,318 2.6053000
The Thief and the Cobbler 25,000,000 669,276 2.6771040
Return of the Living Dead III 2,000,000 54,207 2.7103500
The Indian Runner 7,000,000 191,125 2.7303571
D.E.B.S. 3,500,000 97,446 2.7841714
Roger Corman’s Frankenstein Unbound 11,500,000 334,748 2.9108522
Tideland 19,300,000 566,611 2.9358083
Jin ling shi san chai 94,000,000 2,855,644 3.0379191
Without Limits 25,000,000 777,423 3.1096920
Not Fade Away 20,000,000 636,399 3.1819950
Waking the Dead 8,500,000 270,745 3.1852353
Lady Jane 8,500,000 277,646 3.2664235
The House on Carroll Street 14,000,000 459,824 3.2844571
Body Snatchers 13,000,000 428,868 3.2989846
The Bad Batch 6,000,000 201,890 3.3648333
Valmont 33,000,000 1,132,112 3.4306424
Triple Threat 10,000,000 345,900 3.4590000
Hounddog 3,750,000 131,961 3.5189600
Nothing But the Truth 11,500,000 409,832 3.5637565
Passion 20,000,000 713,616 3.5680800
Dirty Girl 4,000,000 143,485 3.5871250
Until the End of the World 23,000,000 829,625 3.6070652
The Sea of Trees 25,000,000 906,995 3.6279800
The Rainbow 11,987,578 444,055 3.7042929

Who spends more money?

osc_bud <- osc_bud %>%
  mutate(budget = as.numeric(budget)) %>%
  mutate(gross = as.numeric(gross))
# may be at least in part redundant but to be sure numeric

avg_prof <- osc_bud %>% 
  filter(!is.na(budget)) %>% 
  filter(!is.na(gross)) %>% 
  mutate(avg_profit = (100*sum(gross))/(sum(budget))) %>% 
  mutate(percent = 100 * (gross/budget))
  
avg_profit <- head(avg_prof$avg_profit, 1)


win_prof <- osc_bud %>%
  filter(winner == T) %>%  # calculate percent profit Oscar winners
  filter(!is.na(budget)) %>%
  filter(!is.na(gross)) %>%
  mutate(winner_profit = (100*sum(gross))/(sum((budget)))) %>% 
  mutate(percent = 100 * (gross/budget))

lsr_prof <- osc_bud %>%
  filter(winner == F) %>% # calculate percent profit Oscar losers
  filter(!is.na(budget)) %>%
  filter(!is.na(gross)) %>%
  mutate(loser_profit = 100*sum(gross/sum(budget))) %>% 
  mutate(percent = 100 *(gross/budget))

no_nom_prof <- osc_bud %>%
  filter(is.na(winner)) %>% # calculate percent profit non-nominated
  filter(!is.na(budget)) %>%
  filter(!is.na(gross)) %>%
  mutate(nonnom_profit = (100*sum(gross))/(sum(budget))) %>% 
  mutate(percent = 100 * (gross/budget))


money_plot <- osc_bud %>% select(Title, genres, gross,
                                 budget,with_inflation,
                                 gross_inflation, winner) %>%
  ggplot() +
  geom_density(aes((budget), fill = winner), alpha = 0.4)   +
  my_theme +
  xlim(0, 60000000) +
  ylim(0, 0.00000035)  # budget density plot 


ggplotly(money_plot)
money_plot2 <- osc_bud %>% select(Title, genres,
                                  gross, with_inflation,
                                  budget, winner, 
                                  gross_inflation) %>%
  ggplot() +
  geom_density(aes(with_inflation, fill = winner), alpha = 0.4)+
  xlim(0, 60000000) +
  ylim(0, 0.00000035) +
  my_theme # budget with inflation density plot

ggplotly(money_plot2)
grid.arrange(money_plot, money_plot2, ncol=2) 

# side by side budget plots
  • The average Oscar winner budget was $82,927,713.63.
  • The median Oscar winner budget was $52,732,524.55
  • The average Oscar loser budget was $73,041,808.78
  • The median Oscar loser budget was $49,789,090.91
  • The average non-nominated budget was $50,077,035.39.
  • The median non-nominated budget was 34,659,781.29


Who makes more money?

prof_plot <- osc_bud %>% select(Title, genres, gross,
                                budget,with_inflation,
                                gross_inflation,  winner) %>% 
  ggplot() +
  geom_density(aes((gross), fill = winner), alpha = 0.4)   + 
  xlim(0, 200000000) + 
  ylim(0, 0.000000070) +
  my_theme # plot gross revenue

ggplotly(prof_plot)
prof_plot2 <- osc_bud %>% select(Title, genres, gross,
                                budget,gross_inflation, winner) %>% 
  ggplot() +
  geom_density(aes(gross_inflation, fill = winner), alpha = 0.4)   +     xlim(0, 200000000) +
  ylim(0, 0.000000070) +
  my_theme


ggplotly(prof_plot2) # plot gross inflation
grid.arrange(prof_plot, prof_plot2, ncol=2) 

# side by side gross plots
  • The average Oscar winner percent profit was 634.94.
  • The median percent profit for Oscar winners was 575.55
  • The average Oscar loser percent profit was 444.90.
  • The median percent profit for Oscar losers was 377.84.
  • The average non-nominated percent profit was 248.09.
  • The median percent profit for non-nominated movies was 161.21


Spent vs Made

grid.arrange(money_plot, prof_plot, ncol=2) # spent vs made charts

grid.arrange(money_plot2, prof_plot2, ncol=2)

# same chart with inflation

Best Picture Budgets with Inflation

bst_pic_budgets <- bst_pic_won %>% 
  filter(budget > 0) %>% 
  mutate(year_title = paste(year, ', ', Title))
# eliminate the few best (less than 10 I think) pics with 0 or na budget

r <- bst_pic_budgets %>%  
  ggplot() +
  geom_col(aes(x = year_title, y = with_inflation), fill= blue) +
  my_theme  # best picture column plot

ggplotly(r) 
min_bud <- bst_pic_won %>% 
  filter(with_inflation > 0) %>% 
  filter(with_inflation == min(with_inflation)) %>% 
  select(Title, with_inflation) %>% 
  arrange(with_inflation) # determine minimum of best pic budgets

min_bud_title <- head(min_bud$Title, 1) # title of min cost best pic
min_bud_cost <- head(min_bud$with_inflation, 1) # min cost best pic

avg_bst <- mean(min_bud$with_inflation) # average cost of best pic

max_bud <- bst_pic_won %>% 
  filter(with_inflation == max(with_inflation)) %>% 
  select(Title, with_inflation) %>% 
  arrange(desc(with_inflation)) # determine most expensive best pic

max_bud_title <- head(max_bud$Title, 1) # Title of most expensive
max_bud_cost <- head(max_bud$with_inflation, 1) # budget of most expensive

The minimum budget for a Best Picture winner with inflation was Marty with a cost of $3,674,769.95. The average budget for a Best Picture winner with inflation was $51,335,637.82. The maximum budget for a Best Picture winner with inflation was Titanic with a cost of $358,241,758.24.


Summary

I wanted to look at the relationship between the Oscars and money. Looking at the data, it seems clear that Oscar winning movies both tend to cost more to make and have more profits than Oscar losing movies. Following that trend, Oscar losing movies tent to cost more and have more profits than movies that aren’t nominated at all. These results were not surprising, as I would hope that award winning movies would tend to be “better” than movies that win no awards. Still, it hard to say which comes first. Maybe with further investigation into how the money for these movies was spent, might shed more light on an answer. Lastly, it is clear from the Best Picture column chart that a high budget is not a requirement for being an Oscar winner. Lastly I’d just like to add that the movie business seems to be very lucrative. The average percent profit for these movies was 365.32. In essence, this means that a $50,000,000 movie will gross on average 182,659,561.70.